Finance Data Project

In this project, we'll analyze financial data from several bank agencies. The data will be loaded directly from Google Stock prices.

Get the Data

Install pandas-datareader Pandas datareader to allow you to read stock data from web.


In [4]:
from pandas_datareader import data, wb
import pandas as pd
import numpy as np
import datetime
import seaborn as sns
%matplotlib inline

Data

We need to get data using pandas datareader. We will get stock information for the following banks:

  • Bank of America
  • CitiGroup
  • Goldman Sachs
  • JPMorgan Chase
  • Morgan Stanley
  • Wells Fargo

In [5]:
start_date = pd.to_datetime('2006-01-01')
end_date = pd.to_datetime('2016-01-01')

In [6]:
#Bank of America
BAC = data.DataReader('BAC', 'google', start_date, end_date)

In [7]:
#CitiGroup
C = data.DataReader('C', 'google', start_date, end_date)
C.head()


Out[7]:
Open High Low Close Volume
Date
2006-01-03 490.0 493.8 481.1 492.9 1537660
2006-01-04 488.6 491.0 483.5 483.8 1871020
2006-01-05 484.4 487.8 484.0 486.2 1143160
2006-01-06 488.8 489.0 482.0 486.2 1370250
2006-01-09 486.0 487.4 483.0 483.9 1680740

In [8]:
#Goldman Sachs
GS = data.DataReader('GS', 'google', start_date, end_date)
GS.head()


Out[8]:
Open High Low Close Volume
Date
2006-01-03 126.70 129.44 124.23 128.87 6188700
2006-01-04 127.35 128.91 126.38 127.09 4862000
2006-01-05 126.00 127.32 125.61 127.04 3717600
2006-01-06 127.29 129.25 127.29 128.84 4319600
2006-01-09 128.50 130.62 128.00 130.39 4723500

In [9]:
#JPMorgan Chase
JPM = data.DataReader('JPM', 'google', start_date, end_date)
JPM.head()


Out[9]:
Open High Low Close Volume
Date
2006-01-03 39.83 40.36 39.30 40.19 12839400
2006-01-04 39.78 40.14 39.42 39.62 13491800
2006-01-05 39.61 39.81 39.50 39.74 8109400
2006-01-06 39.92 40.24 39.55 40.02 7966900
2006-01-09 39.88 40.72 39.88 40.67 16575200

In [10]:
#Morgan Stanley
MS = data.DataReader('MS', 'google', start_date, end_date)
MS.head()


Out[10]:
Open High Low Close Volume
Date
2006-01-03 57.17 58.49 56.74 58.31 5377000
2006-01-04 58.70 59.28 58.35 58.35 7977800
2006-01-05 58.55 58.59 58.02 58.51 5778000
2006-01-06 58.77 58.85 58.05 58.57 6889800
2006-01-09 58.63 59.29 58.62 59.19 4144500

In [11]:
#Wells Fargo
WFC = data.DataReader('WFC', 'google', start_date, end_date)
WFC.head()


Out[11]:
Open High Low Close Volume
Date
2006-01-03 31.60 31.98 31.20 31.90 11016400
2006-01-04 31.80 31.82 31.36 31.53 10871000
2006-01-05 31.50 31.56 31.31 31.50 10158000
2006-01-06 31.58 31.78 31.38 31.68 8403800
2006-01-09 31.68 31.82 31.56 31.68 5619600

Create a list of the ticker symbols in alphabetical order


In [12]:
tickers = list(['BAC', 'C', 'GS', 'JPM', 'MS', 'WFC'])
tickers


Out[12]:
['BAC', 'C', 'GS', 'JPM', 'MS', 'WFC']

Use pd.concat to concatenate the bank dataframes together to a single data frame called bank_stocks. Set the keys argument equal to the tickers list. Also pay attention to what axis you concatenate on.


In [13]:
bank_stocks = pd.concat([BAC, C, GS, JPM, MS, WFC], axis=1, keys=tickers)
bank_stocks.head()


Out[13]:
BAC C ... MS WFC
Open High Low Close Volume Open High Low Close Volume ... Open High Low Close Volume Open High Low Close Volume
Date
2006-01-03 46.92 47.18 46.15 47.08 16296700 490.0 493.8 481.1 492.9 1537660 ... 57.17 58.49 56.74 58.31 5377000 31.60 31.98 31.20 31.90 11016400
2006-01-04 47.00 47.24 46.45 46.58 17757900 488.6 491.0 483.5 483.8 1871020 ... 58.70 59.28 58.35 58.35 7977800 31.80 31.82 31.36 31.53 10871000
2006-01-05 46.58 46.83 46.32 46.64 14970900 484.4 487.8 484.0 486.2 1143160 ... 58.55 58.59 58.02 58.51 5778000 31.50 31.56 31.31 31.50 10158000
2006-01-06 46.80 46.91 46.35 46.57 12599800 488.8 489.0 482.0 486.2 1370250 ... 58.77 58.85 58.05 58.57 6889800 31.58 31.78 31.38 31.68 8403800
2006-01-09 46.72 46.97 46.36 46.60 15620000 486.0 487.4 483.0 483.9 1680740 ... 58.63 59.29 58.62 59.19 4144500 31.68 31.82 31.56 31.68 5619600

5 rows × 30 columns

Set the column name levels (this is filled out for you):


In [14]:
bank_stocks.columns.names = ['Bank Ticker','Stock Info']

Check the head of the bank_stocks dataframe.


In [15]:
bank_stocks.head()


Out[15]:
Bank Ticker BAC C ... MS WFC
Stock Info Open High Low Close Volume Open High Low Close Volume ... Open High Low Close Volume Open High Low Close Volume
Date
2006-01-03 46.92 47.18 46.15 47.08 16296700 490.0 493.8 481.1 492.9 1537660 ... 57.17 58.49 56.74 58.31 5377000 31.60 31.98 31.20 31.90 11016400
2006-01-04 47.00 47.24 46.45 46.58 17757900 488.6 491.0 483.5 483.8 1871020 ... 58.70 59.28 58.35 58.35 7977800 31.80 31.82 31.36 31.53 10871000
2006-01-05 46.58 46.83 46.32 46.64 14970900 484.4 487.8 484.0 486.2 1143160 ... 58.55 58.59 58.02 58.51 5778000 31.50 31.56 31.31 31.50 10158000
2006-01-06 46.80 46.91 46.35 46.57 12599800 488.8 489.0 482.0 486.2 1370250 ... 58.77 58.85 58.05 58.57 6889800 31.58 31.78 31.38 31.68 8403800
2006-01-09 46.72 46.97 46.36 46.60 15620000 486.0 487.4 483.0 483.9 1680740 ... 58.63 59.29 58.62 59.19 4144500 31.68 31.82 31.56 31.68 5619600

5 rows × 30 columns

EDA

Reference: Multi-Level Indexing and Using .xs.

What is the max Close price for each bank's stock throughout the time period?


In [16]:
bank_stocks.xs('Close', level=1, axis=1).max()


Out[16]:
Bank Ticker
BAC     54.90
C      564.10
GS     247.92
JPM     70.08
MS      89.30
WFC     58.52
dtype: float64

Create a new empty DataFrame called returns. This dataframe will contain the returns for each bank's stock. returns are typically defined by:

$$r_t = \frac{p_t - p_{t-1}}{p_{t-1}} = \frac{p_t}{p_{t-1}} - 1$$

In [17]:
returns = pd.DataFrame()

In [18]:
bank_df = bank_stocks.xs('Close', level=1, axis=1)
bank_df.head()


Out[18]:
Bank Ticker BAC C GS JPM MS WFC
Date
2006-01-03 47.08 492.9 128.87 40.19 58.31 31.90
2006-01-04 46.58 483.8 127.09 39.62 58.35 31.53
2006-01-05 46.64 486.2 127.04 39.74 58.51 31.50
2006-01-06 46.57 486.2 128.84 40.02 58.57 31.68
2006-01-09 46.60 483.9 130.39 40.67 59.19 31.68

We can use pandas pct_change() method on the Close column to create a column representing this return value. Create a for loop that goes and for each Bank Stock Ticker creates this returns column and set's it as a column in the returns DataFrame.


In [19]:
for t in tickers:
    returns[t+' Return'] = bank_stocks[t]['Close'].pct_change()
returns.head()


Out[19]:
BAC Return C Return GS Return JPM Return MS Return WFC Return
Date
2006-01-03 NaN NaN NaN NaN NaN NaN
2006-01-04 -0.010620 -0.018462 -0.013812 -0.014183 0.000686 -0.011599
2006-01-05 0.001288 0.004961 -0.000393 0.003029 0.002742 -0.000951
2006-01-06 -0.001501 0.000000 0.014169 0.007046 0.001025 0.005714
2006-01-09 0.000644 -0.004731 0.012030 0.016242 0.010586 0.000000

Create a pairplot using seaborn of the returns dataframe


In [20]:
#returns_drop_na = returns.dropna()
sns.pairplot(returns[1:])


Out[20]:
<seaborn.axisgrid.PairGrid at 0x16412a284a8>

Using this returns DataFrame, figure out on what dates each bank stock had the best and worst single day returns. You should notice that 4 of the banks share the same day for the worst drop, did anything significant happen that day?


In [21]:
# Worst Drop (4 of them on Inauguration day)
returns.idxmin()


Out[21]:
BAC Return   2009-01-20
C Return     2011-05-06
GS Return    2009-01-20
JPM Return   2009-01-20
MS Return    2008-10-09
WFC Return   2009-01-20
dtype: datetime64[ns]

You should have noticed that Citigroup's largest drop and biggest gain were very close to one another, did anything significant happen in that time frame?


In [22]:
# Citigroup stock split in 2011-05
returns.idxmax()


Out[22]:
BAC Return   2009-04-09
C Return     2011-05-09
GS Return    2008-11-24
JPM Return   2009-01-21
MS Return    2008-10-13
WFC Return   2008-07-16
dtype: datetime64[ns]

Take a look at the standard deviation of the returns, which stock would you classify as the riskiest over the entire time period? Which would you classify as the riskiest for the year 2015?


In [23]:
returns.std()


Out[23]:
BAC Return    0.036650
C Return      0.179969
GS Return     0.025346
JPM Return    0.027656
MS Return     0.037820
WFC Return    0.030233
dtype: float64

In [24]:
returns['2015'].std()


Out[24]:
BAC Return    0.016163
C Return      0.015289
GS Return     0.014046
JPM Return    0.014017
MS Return     0.016249
WFC Return    0.012591
dtype: float64

Create a distplot using seaborn of the 2015 returns for Morgan Stanley


In [25]:
sns.distplot(returns['2015']['MS Return'], bins=100, color='green')


C:\Users\Luiz Henrique\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning:

using a non-integer number instead of an integer will result in an error in the future

Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x1641593e668>

Create a distplot using seaborn of the 2008 returns for CitiGroup


In [26]:
sns.distplot(returns['C Return']['2008'], color='red')


C:\Users\Luiz Henrique\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning:

using a non-integer number instead of an integer will result in an error in the future

Out[26]:
<matplotlib.axes._subplots.AxesSubplot at 0x16416385588>

More Visualization

Imports


In [27]:
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
%matplotlib inline

import plotly
import cufflinks as cf
cf.go_offline()


Create a line plot showing Close price for each bank for the entire index of time


In [28]:
close_stock_prices = bank_stocks.xs('Close', axis=1, level='Stock Info')
close_stock_prices.iplot(kind='line')



In [29]:
bank_stocks.xs('Close', axis=1, level='Stock Info').plot()


Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x1641742cf28>

Moving Averages

Plot the rolling 30 day average against the Close Price for Bank Of America's stock for the year 2008


In [30]:
BAC_30_2008_days = bank_stocks['2008']['BAC']['Close'].rolling(30).mean()
close_price_2008_BAC = bank_stocks['2008']['BAC']['Close']

BAC_30_2008_days.plot(figsize=(10,6), label='30 Days Average')
close_price_2008_BAC.plot(label='Close Price')
plt.legend()


Out[30]:
<matplotlib.legend.Legend at 0x16417564748>

Create a heatmap of the correlation between the stocks Close Price


In [31]:
close_price = bank_stocks.xs('Close', axis=1, level='Stock Info')
sns.heatmap(close_price.corr(), annot=True)


Out[31]:
<matplotlib.axes._subplots.AxesSubplot at 0x16417547e80>

Seaborn's clustermap to cluster the correlations together


In [32]:
sns.clustermap(close_price.corr(), annot=True)


Out[32]:
<seaborn.matrix.ClusterGrid at 0x164176507f0>

In [33]:
bank_stocks.xs(key='Close',axis=1,level='Stock Info').corr().iplot(kind='heatmap', colorscale='rdylbu')


Part 2

Using cufflinks to creat interactive plots

Create a candle plot of Bank of America's stock from Jan 1st 2015 to Jan 1st 2016


In [34]:
start_date = pd.to_datetime('2015-01-01')
end_date = pd.to_datetime('2016-01-01')
bank_stocks['BAC'][['Open', 'High', 'Low', 'Close']]['2015-01-01':'2016-01-01'].iplot(kind='candle')



In [ ]:

Create a Simple Moving Averages plot of Morgan Stanley for the year 2015.


In [35]:
bank_stocks['MS']['Close'].ix['2015-01-01':'2016-01-01'].ta_plot(study='sma', periods=[13,21,55],title='Simple Moving Averages')


Create a Bollinger Band Plot for Bank of America for the year 2015.


In [36]:
bank_stocks['BAC']['Close'].ix['2015-01-01':'2016-01-01'].ta_plot(study='boll')